Biological interaction: Genetic factor(s) and environmental factor(s) participate in the same causal mechanism in the same individual (Rothman et al., 2008)
Statistical interaction using linear regression (unrelated individuals):
\(y = \mu + \beta_g x_g + \beta_e x_e + \beta_{int} x_g \times x_e + e\)
(Aschard et al., 2012, HumGen)
| Consortium | Sample size | Exposure | Outcome | Reference |
|---|---|---|---|---|
| CHARGES + SPIROMETA | 50,047 | Smoking | Pulmonary function | (Hancock et al., 2012) |
| SUNLIGHT | 35,000 | Vitamin D intake | Circulating Vitamin D level | (Wang et al., 2010) |
| GIANT | up to 339,224 | Gender | Anthropometric traits | (Heid et al., 2010) |
| … |
Most have used the aforementioned full model (on the previous slide), but others used stratified approach (coming on the next slides).
Example: G x smoking in pulmonary function outcomes (Hancock et al., 2012)
Findings: three novel gene regions
Figure: G x ever-smoking in FEV1/FVC (Hancock et al., 2012)
Abbreviations: FEV1, Force Expiratory Volume in 1 second; FVC, Force Vital Capacity
Stratified GxE tests (Magi et al., 2010), (Randall et al., 2013) are widely used in meta-analysis by big consortia
Example: G x gender in the Genetic Investigation of Anthropometric Traits (GIANT) consortium
Findings: 7 loci showed sex-specificity
Abbreviations: WHR, Waist-hip ratio
Power for interaction test is much lower than for marginal
It also faces other potential issues (Aschard et al., 2012, HumGen):
Relatedness is yet another layer of complexity in GxE analysis
Assess the relative performance of GxE methods
in the presence of structure
Methods to account for relatedness are relatively well established
in marginal association studies (GWAS)
| Study design 1 | Study design 2 | Study design 3 | |
|---|---|---|---|
| Sample | Family-based | Population-based | Population-based |
| Relationships | Kinship | GRM | |
| Method | Linear mixed models | Linear models | Linear mixed models |
GxE in study design 3 is our ongoing work (not presented today)
GxE in study designs 1 vs. 2 (today focus)
Given: a population of 50,000 related samples (nuclear families)
Experiment: pool 5,000 unrelated samples or pool randomly
| relatedness | \(V\) | \(\Sigma_x\) | Normalization |
|---|---|---|---|
| unrelated | \(\sigma_g^2 K + \sigma_r^2 I = (\sigma_g^2 + \sigma_r^2) I\) | \(\sigma_x I\) | \(\sigma_g^2 + \sigma_r^2 = 1\) |
| genetically related | \(\sigma_g^2 K + \sigma_r^2 I\) | \(\sigma_x K\) | \(\sigma_g^2 + \sigma_r^2 = 1\) |
\(y = \mu + \beta_g x_g + g + f + e\)
\(\mbox{ } \mbox{ } = X \beta + g + f + e\)
\(\mbox{where } g \perp f \perp e\)
\(\mbox{implying}\)
\(y \sim (X \beta, \sigma_g^2 K + \sigma_f^2 F + \sigma_r^2 I) = (X \beta, V)\)
(Lynch and Walsh, 1998)
\(\hat{V} = \hat{\sigma_g^2} K + \hat{\sigma_f^2} F + \hat{\sigma_r^2} I\)
\(\hat{\beta}_g = (X^T \hat{V}^{-1} X)^{-1} X^T \hat{V}^{-1} Y\)
\(\hat{\beta}_g \simeq \mathcal{N}(\beta, (X^T \hat{V}^{-1} X)^{-1})\)
\(var(\hat{\beta}_g) = ({x^*_g}^T \hat{V}^{-1} x^*_g)^{-1}\)
The power in genetic association studies with linear models is a function of the non-centrality parameter (NCP)
\(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_x)\)
| Data | Model |
|---|---|
| phenotype | \(y \sim (X \beta, V) = (X \beta, \sigma_f^2 F + \sigma_g^2 K + \sigma_r^2 I)\) |
| genotypes | \(x \sim (\mu_x, \Sigma_x)\) |
The standard error from a fixed effect LM applied in related individuals is not well calibrated (i.e. underestimated)
Compare formula
For unrelated: \(NCP \approx \beta^2 \mbox{ } 2pq \mbox{ } N \mbox{; } (\sigma_e^2 = 1)\)
For genetically related: \(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_{x^*_g}) = \beta^2 2pq \mbox{ } tr((\hat{\sigma}_g^2 K + \hat{\sigma}_e^2 I)^{-1} K) \mbox{; } (\sigma_g^2 + \sigma_e^2 = 1)\)
(Visscher et al., 2008)
But our formula allows us to explore further performances across various study designs
For house-hold related: \(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_{x^*_g}) = \beta^2 \mbox{ } 2pq \mbox{ } trace((\hat{\sigma}_f^2 F + \hat{\sigma}_e^2 I)^{-1})\)
| Stratas | Stratified interaction test | Reference |
|---|---|---|
| Idependent | \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2}} \sim \mathcal{N}(0, 1)\) | (Magi et al., 2010) |
| Related | \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2 + r \sigma_{\beta_m} \sigma_{\beta_f}}} \sim \mathcal{N}(0, 1)\) | (Randall et al., 2013) |
\(r\) is the spearman correlation between the two tests
The Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) Project
Developed tools for analysis of family-based samples
COPDgene dataset
Previous studies reported
The project aims at leveraging the ancestry information in GxE tests